Amino Acid Classification and Hash Seeds for Homology Search

نویسندگان

  • Weiming Li
  • Bin Ma
  • Kaizhong Zhang
چکیده

Spaced seeds have been extensively studied in the homology search field. A spaced seed can be regarded as a very special type of hash function on k-mers, where two k-mers have the same hash value if and only if they are identical at the w (w < k) positions designated by the seed. Spaced seeds substantially increased the homology search sensitivity. It is then a natural question to ask whether there is a better hash function (called hash seed) that provides better sensitivity than the spaced seed. We study this question in the paper. We propose a strategy to classify amino acids, which leads to a better hash seed. Our results raise a new question about how to design the best hash seed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multiple spaced seeds for homology search

MOTIVATION Homology search finds similar segments between two biological sequences, such as DNA or protein sequences. The introduction of optimal spaced seeds in PatternHunter has increased both the sensitivity and the speed of homology search, and it has been adopted by many alignment programs such as BLAST. With the further improvement provided by multiple spaced seeds in PatternHunterII, Smi...

متن کامل

Subset Seed Extension to Protein BLAST

A bstract: The seeding technique became central in the theory of sequence alignment and there are several efficient tools applying seeds to D N A homology search. Recently, a concept of subset seeds has been proposed for similarity search in protein sequences. We experimentally evaluate the applicability of subset seeds to protein homology search. We advocate the use of multiple subset seeds de...

متن کامل

In Silico Analysis of Glutaminase from Different Species of Escherichia and Bacillus

Background: Glutaminase (EC 3.5.1.2) catalyzes the hydrolytic degradation of L-glutamine to L-glutamic acid and has been introduced for cancer therapy in recent years. The present study was an in silico analysis of glutaminase to further elucidate its structure and physicochemical properties.Methods: Forty glutaminase protein sequences from different species of Escherichia and Bacillus obtained...

متن کامل

Identification of amino acids in Securigera securidaca, a popular medicinal herb in Iranian folk medicine

Securigera securidaca (L.) Degen & Dorfl grows in different parts of Iran. The seeds of the species are used in Iranian folk medicine as an anti-diabetic agent. Many studies have established hypoglycemic effects of amino acids and in the present investigation, amino acids of Securigera securidaca seeds have been evaluated. The ground seeds were extracted using petroleum ether,...

متن کامل

A Large-scale Batch-learning Self-organizing Map for Function Prediction of Poorly-characterized Proteins Progressively Accumulating in Sequence Databases

Homology searches for nucleotide and amino-acid sequences have been used widely to predict functions of genes and proteins when genomes are decoded and thus become a basic bioinformatics tool. Whereas usefulness of the sequence homology search is apparent, it has become increasingly clear that homology search can predict the protein function of only 50% of genes, or fewer, when a novel genome i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009